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Abstract. Our ability to extract the maximal amount of information 
from future observations at gigahertz frequencies depends on our ability 
to separate the underlying cosmic microwave background (CMB) from 
galactic and extragalactic foregrounds. We review the separation problem 
and its formulation within Bayesian inference, give examples of specific 
solutions with particular choices of prior density, and finally comment 
on the generalization of Bayesian methods to a multi-resolution frame- 
work. We propose a strategy for the regularization of solutions allowing a 
spatially varying spectral index, and discuss possible computational ap- 
proaches such as multi-scale stochastic relaxation. 
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1. Introduction 

Future observations from ground based, balloon borne, and satellite missions 
at gigahertz frequencies will contain a wealth of information. Our ability to 
extract the maximal amount of information from these experiments depends 
on our ability to separate the underlying cosmic microwave background (CMB) 
from galactic and extragalactic foregrounds. Anticipated components of the to- 
tal foreground emission include synchrotron, free-free, and dust emission in our 
own galaxy, plus various extragalactic point sources. The interesting cosmo- 
logical information, relevant for testing inflation and determining cosmological 
parameters, is found at sub-degree angular scales, but it is at these angular scales 
that we begin to resolve the microphysics of the galaxy, potentially complicating 
the separation of the underlying cosmic and foreground signals. 

It has been demonstrated that, for foregrounds simulated by extrapolat- 
ing existing observations to the relevant frequencies and spatial resolutions of 
the next generation of experiments, an accurate reconstruction of the CMB is 
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possible (Brandt et al. 1994, Tegmark & Efstatiou 1996, Hobson et al. 1998). 
However, these foreground scenarios involve overly simple assumptions about 
the emissivity or spatial structure of the foregrounds. These methods assume 
the foregrounds have a power law emissivity, I v oc is a , and that the spectral 
index a is known or can be adaquately inferred from the emission averaged over 
some spatial scale. The spatial structure of the variation of the spectral index, 
at the frequencies relevant for observations of the CMB, is in fact a priori un- 
known. Although further complexities in foreground models are a nuisance with 
respect to CMB extraction, they represent a "gold mine" of interesting physical 
quantities for ISM studies (A. Lazarian, 1998). 

Because of unknown complexities in the physics of the foregrounds, we have 
a family of models, in which previous assumptions, such as a spatially constant 
spectral index, are successively relaxed. Inference within models with increas- 
ingly many degrees of freedom becomes impossible (completely degenerate), so 
the ability to constrain solutions is necessary for the extraction of useful infor- 
mation from the data. Bayesian inference provides a unifying framework within 
which the separation problem can be addressed in its full range of complexity. 
In addition, the Bayesian framework can provide computational solutions for 
more complicated, yet physically motivated, models through algorithms includ- 
ing stochastic relaxation. 

In this article, we will review the separation problem and its formulation 
within Bayesian inference, give examples of specific solutions with particular 
choices of prior density, and finally comment on the generalization of Bayesian 
methods to a multi-resolution framework. The paper is meant to convey ideas 
of variations on current analysis strategies and present the range of possible 
approaches from linear filtering to mutli-scale stochastic relaxation. 



2. Review of the Separation Problem 

The anticipated galactic foregrounds contributing to the total detected emission 
at frequencies of interest for CMB observations include free-free, synchrotron, 
and dust. The total specific intensity integrated along the line of sight from 
these sources is (following the convention in (Tegmark Sz Efstatiou 1996)) 

'(270.2 MJy sr" 1 ) (p§L) 
_ +(24.8 MJy sr"' K~ l ) (-^f ff^w] 

(r ' )= (£)•"+ 4**Mfer 

+jt°m (%y +Kr} ( a) 

where Jo is a dimensionless map at a convenient reference frequency for the spe- 
cific physical component, Xd = hv/kTd ~ zv/417GHz for a dust temperature of 
T d = 20K, T is the temperature of the CMB blackbody, and x = z//56.8GHz. 
Subtracting the isotropic blackbody term and converting to brightness temper- 
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ature (in micro-Kelvin) gives 

T(r,u) = 5T(™ B \r) +J]A«(rf(r) (-Y^ (2) 

where is the spatially independent frequency dependence for the i th 

physical component, with spatial variations /?W(r). Note that the CMB is a 
frequency-independent, zero-mean field in these units. The frequency depen- 
dence of free-free emission is given by physics assuming typical electron densi- 
ties and temperatures, while the synchrotron and dust spectral indices can be 
spatially varying due to spatial dependence of the galactic magnetic field and 
dust grain properties. The data returned from an experiment will be the inte- 
grated brightness temperature over the bandpass of the frequency channels of 
the instrument (with center frequency v{) with additive noise rj(r,Ui) giving 

T obs (r, n) = 5T^ CMB \r) + J2 A^\u^\r) (nY^ + r,(r, „ 4 ) (3) 

(note that the maps at each frequency are the result of an initial processing 
stage from the time series returned from the experiment excuting a particular 
scan strategy. We will not include details of this stage of analysis here.) 



3. Bayesian Formulation 

Given observations T t> s at the frequency channels of the instrument, we would 
like to recover the amplitude and spectral index for each of the assumed present 

physical components. The probability p[T i, s \Tq\ of the data given the un- 
derlying amplitude (r) and spectral indices /?W (r) is given by the statistics of 
the noise process and scan strategy. This probability is known as the likelihood, 
where log p[T obs \T^ , /?«] ~ -X 2 up to an additive constant (the normalization 
constant), and for pixel independent Gaussian noise 



X 



with N r 



(4) 



To find the underlying variables (T (r),/3^^(r)), we could simply try to 
minimize \ 2 ■ I n tli e limit that we have an overdetermined system with high 
signal-to-noise ratio, we can obtain good results with linear methods such as 
singular value decomposition (with the spectral index assumed known), or a 
non- linear x 2 method such as used by (Brandt et al. 1994). However, as noted 
in (Brandt et al. 1994), the recovery of the amplitude and spectral index for all 
the physical components in the presence of noise is numerically unstable. Some 
means of regularizing the solution is needed. 

There are several stategies that can be employed to regularize solutions. 
(Brandt et al. 1994) discuss two reasonable simplifications - either assume the 
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contribution of a particular physical component is negligible at specific frequency 
channels, or find the spectral index from the average emission on patches of sky 
10° x 10°, and then find the amplitudes at all pixels holding the spectral index 
fixed within the given patch of sky 

Wiener filtering is another strategy that has been demonstrated successfully 
on simulated foregrounds, and is approporiate in cases where we have informa- 
tion about the spatial power spectrum of the foregrounds. The original method 
of (Tegmark & Efstatiou 1996) assumes the foregrounds are uncorrelated, with 
a power spectrum of the form C\ oc The Wiener matrix W is constructed so 

that Tq = WT b s , and the expected value of the residual error is a minimum, 
giving the solution 



T 



(0 



-i 



A T N~ 1 A + C- 1 A T N- 1 T obs (5) 



where W 



-l 



A T N 1 A + C 1 A T N 1 , A is the frequency response matrix as 

above, and dj = (T W (r;)T (i) (r,-)) is the power spectrum of the i th foreground 
component. 

As observed by (Hobson et al. 1998), solutions to the foreground separation 
problem can be generically formulated within Bayesian inference. In Bayesian 
inference, the posterior probability is interpreted as the figure of merit of a 
solution, quantifying our degree of confidence, and given by 

p[T (i) , /3« \T obs ] oc p[T obs |T (i) , /3«MT (l) , /?«] (6) 

The first term on the right-hand side is the likelihood as discussed above. 
The term p[T®,0®] is the prior density which effectively regularizes solutions 
through a statistical characterization of the foregrounds known or assumed a 
priori. Choosing a Gaussian prior with the assumed power spectrum of the 
foregrounds gives the Wiener filter solution as the maximum posterior solution 
(as pointed out by (Hobson et al. 1998)). The maximum entropy (MAXENT) 
method of (Hobson et al. 1998)) finds the maximum posterior solution with 
the log-prior given by the entropy. The method of (Brandt et al. 1994) effec- 
tively uses a uniform prior over an allowed interval for the fluctuations within a 
simplified model. 

We also note a slight variation on the Wiener filtering solution, in which 
we include previous observations as a spatial template for substraction. For 
example, denoting the previous observations extrapolated to the frequencies and 
resolution of interest T ot ^ er , we can use the prior (in matrix notation) 

- logp[T^\T other ] ~ YPf) Tc ~ X Vf) + £( T o W )A*(W) (7) 

i i 

where A quantifies the relative weight of the coupling of (T ot h er ) to inferences 
made about (Tq^). This prior gives the posterior 

- \ogp[T^\^\T obsj T other ] ~ x 2 + (T^) T C-\T^) + (T^)A(T other ) (8) 
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The solution which maximizes the posterior is then 



T (i) (r)= A T N- 1 A + C~ 1 * A T N~ 1 T ( 



' obs 



AT, 



other 



(9) 



Coupling to previous observations as a spatial template, and choosing the max- 
imum posterior solution, gives a subtraction of a particular foreground compo- 
nent through the filtered map AT ^ er . This method has the obvious danger of 
the false addition of correlation in the various components, however we might 
want to include information from previous observations at large angular scales, 
where instrumental noise and systematic effects might be typically very low. 



4. Separation Problem in a Multi-Resolution Setting 

There is a natural way to define fluctuation maps at various scales by adopting 
a multi-resolution approach (which will not be discussed in great detail here - 
we follow the discussion in (Langer et al. 1993)). A multi-resolution approach 
can be implemented with a smoothing function to recursively generate low-pass, 
or Gaussian, images and band-pass, or Laplacian, images according to 

T$\r,* j )=G*T®(r,<T j - 1 ) 

(10) 

LT W (r,a,_ 1 ) = L*T W (r,a,_ 1 ) 

where L = I — G. An example for the smoothing filter is the discrete approx- 
imation to a Gaussian provided by a separable filter G(i,j) = gigj, where for 
example we can take gi = [1/16,1/4,3/8,1/4,1/16]. Note that this recursive 
scheme gives a non-orthogonal and overcomplete wavelet decomposition. The 
importance of a non-orthogonal and overcomplete basis for image analysis (as 
opposed to image compression) has been stressed elsewhere (see (Langer et al. 
1993) and references therein for a discussion of this point). The basic motivation 
for a non-orthogonal, overcomplete basis is that the coefficients in an orthogonal 
wavelet bases can become diffuse upon translations of the original image. 

In such a scale-space decomposition, we have fluctuation maps at various 

SCcli6S 

LT(r,v,a)=L5T ( - CMB ^r,a)+Y,A ir) {is)LTV{r,v,a) (11) 

i 

where we have defined the effective scale-space spectral index as 

T^(r,u,a)=T^\r,a)^Y ){r,CT) (12) 

Note that the above implicitly defines the spectral index in terms of the difference 
in the log brightness temperature at spatial scale a of a physical component at 
two frequencies. 

The motivation for the above conventions is simply that, given the spectral 
indices for the physical components at a given scale, we can iteratively update 
the fluctuation maps (the simple separation problem traditionally considered). 
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However, when adjusting the spectral index, we need to work with the total 
brightness temperature at the next scale, since 

ctfWfaaj-ij-pWir,^)] = log \T®{r,v,<Tj) + LT»(r,j,,a,_ 1 ) 



-log 



(T «(r,a,) + LT «(r,a J _ 1 ))(^) 



*(0r ^ (13) 



where c v = log(^/V ). The scale-space data can then be written as 

Tcbsiw,*) = ST( CMB \r,a) + J2^ ) (^ j) (r,a) fnY^^ + ^r^a) 

( 14 ) 

where the noise rj(r, Vi,cr) has covariance matrix ({r t rG Gnj)) 1 . 

We explored a simple separation problem (standard galactic foregrounds 
with assumed known and spatially constant spectral index) within the above 
multi-resolution context as a test case in anticipation of more complex fore- 
ground scenarios. We considered only a single Gaussian and Laplacian compo- 
nent, so that 

T W (r,0)=LT W (r,0)+T W (r,a) (15) 

For pixel independent Gaussian noise, the majority of the noise power is con- 
tained in the Laplacian component of the total observed emission, and we ex- 
pect the foregrounds to be locally smooth with sparsely distributed dominant 
features in the map. We are therefore motivated to assume a quadratic prior in 
the Laplacian component 

-logp[LT ( V#T( LT o%0) 2 (16) 



where 6 3 is a parameter quantifying the relative weight between the prior and 
likelihood (see the review in (Geman 1990) for other applications of this type of 
prior). The maximum posterior solution is given by, 

T W = [A T N~ l A + B- H]- l A T N- 1 T obs (17) 

where the matrix elements B 3 rs = 0^S rs , and H$ s = J '[2G rs — (GG) rs ]. 

In order to implement the above however, an estimate for 0^) was needed. 
We first found a preliminary solution with x 2 minimization, equivalent to the 
above solution with 0i = 0. After this solution was obtained, we estimate 0i 
from the initial solution as 

j = 1 - T ^— (18) 

2((LT W )2) 

where the angle brackets denote the spatial average over the initial solution. 
After fixing J , we continue with an iterative improvement on the Laplacian 
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component according to 



LT^ (n + 1) = aA T N- 1 [T obs - AGT^ (n)] + MLT W (n) 

T W (n + l) = T (l) (n)+LT W (n + l)-LT (i) (n) (19) 

GT W (n + l) = G*T W (n + l) 

where the iteration matrix is M = I — a(A T N~ l A + B) and a k = 1/(A T N~ 1 A+ 
B)kk- The iteration relaxes to the desired maximum posterior solution. The 
motivation for this iteration is that, to a good approximation, the low-pass 
filtered x 2 solution is very close to the true low-pass filtered map (since the low- 
pass noise rms is drastically reduced). The needed improvement to the solution 
is in the Laplacian component, leading us to the iteration above. 

5. Parameter Estimation 

The approximation for 9^ gave good results for our specific simple test case 
above, but in general we will want a more consistent approach to fixing pa- 
rameters. Within Bayesian inference, the formal way to treat parameters is to 
include them in the posterior as another piece of information to be drawn from 
the data. In general, for a collection of weight parameters in the prior (such as 
the parameters 0^ in our simple test case above), the full posterior becomes 

p[4\ /3« 6\T obs ) oc p[T obs \T^\ /? W MT W , /?« \0]p[0] (20) 

where 6 is the vector of weight parameters. Note that the likelihood is not 
dependent on the parameters in the prior. The parameters can then be chosen 
from the density p[9,T obs ], obtained by marginalization over (T W ,/3«), so that 

p[6,T obs ] k P [9] J d[T^\(3^]p[T obs \T^,(3^]p[T^,p^\e] (21) 

The density p[9, T obs ] is typically a very sharply peaked function. A reasonable 
choice of parameter values is then the one that maximizes the likelihood 

6 = m^ eP [T obs \9] (22) 

This is the strategy employed by (Hobson et al. 1998) to fix the relative weight 
between an entropic prior and the likelihood. 

6. Generalization of Multi-Resolution Bayesian Inference 

Returning to the multi-resolution setting described before, we want to discuss 
Bayesian methods proceeding from coarse to fine scales. For notational pur- 
poses, let T^ j \r) = r (i) (r,Oj-) be the emission at the reference frequency of 

the i th phyiscal component at the j th scale. Let Tm'^(r) = T^(r,v m ,aj) be 
the emission in frequency channel m of the i th physical component at the j th 
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scale, and finally let /3^\r) = /?W(r, <7j) be similarly defined. We can factor 
the probability density according to 

p[4\^\T obs ] oc ||nP[ T o (lJ ^ (ij) l^ J+1) >/3 (lJ+1) ^o6 s ]j (23) 
Each term above has the form 

P[T ° tft |T ° ' To6sJoc MT^\^M h3+l \^ +1) ] 

where the first term is just the generalization of \ 2 at scale <jj, and the second 
term is the prior. We can factor the prior so that 

The last term p[T^' j) \T^' j+1) , (3^ +1 )} is the prior on the amplitudes at the 
reference frequency with the constraint that the low-pass filtered map of the 

finer scale amplitude agree with the coarse scale above, ' = G*T<p". We 

could use multi-resolution generalizations of the previous priors for this term, 
such as multi-resolution MAXENT methods (Starck & Pantin 1996, Starck et al. 

1998). The term p[(3^ |T (i,j) , T (i ' i+1) , ^( < J+ 1 )] is a prior on the spectral index 

given the amplitudes at the reference frequency, and can be formulated in terms 

(i i) 

of statistical characterizations of T m , the emission of the physical components 
at each frequency. 

One possible strategy in constructing a prior for is motivated by the ob- 
servation that the foregrounds are non-Gaussian. Realizations of a non-Gaussian 
process can be more compressable in a suitable basis (depending on the type 
of non-Gaussianity) , so that fewer wavelet coefficients are needed to reconstruct 
the image for a given rms error than in the Fourier basis. Therefore, we expect 

to be able to make good predictions about Tm from T m simply by assum- 
ing that there are no new features at the finer scale. This approximation is a 
good one over large scale-space intervals for dominant features. 

Dominant features can be associated with wavelet maxima, and tracking the 
location and amplitude of wavelet maxima across scale provide a way to contrain 

proposed fine scale maps Tm^ given Tm^ +l \ and therefore implicitly constrain 
variations in the spectral index. Wavelet maxima continuously merge in passing 
from fine to coarse scales, with no new maxima created at lower resolutions. 
This allows us to predict the location of maxima as we go to finer scales, with 
the creation of maxima only allowed if the data itself makes a strong case for it. 
In addition, wavelet maxima have been demonstrated to be a numerically stable 
representation of images, and a direct linear reconstruction solution given by 
the amplitude of the maxima (Mallat and Zhong 1991, Carmona et al. 1998). 
A generalization of the wavelet maxima representation is given by Basis Peruit 
(Chen et al. 1999), in which a decomposition is given by 

Tm"°\r) = X)fc Q; m,fcV'fc( r ) such that S[a^ k ] is a minimum (26) 



where ip k are the wavelet basis functions, and S = J2k\ a mk\ quantifies the 
sparseness of the wavelet coefficients. Basis Persuit has been shown to generate 
solutions that are very close to the wavelet maxima representation, demonstrat- 
ing that maxima can be understood as optimal representations in a specific sense 
for a wide variety of images. 

We therefore propose that the relevant variations in the spectral index to 
consider are the variations in the spectral index which increase the likelihood and 
which are generated from a sparse representation according to 

Cu[/3 d,J) _ ^y+U] = log (g> * £a« fc Vfc(r)) - loglF (27) 
where the wavelet coefficients should be consistent with the constraint 

T W + i) = r (M + D (^y M1>(r) = G j+1 * a^Mr^j (28) 

Formally then, we can give a prior for the spectral index as a marginal process 
on the "hidden" wavelet coefficients according to (with the content on the right 
side of the condition bar implied in the above) 

p[^M l '^T^ +l \^\ = J d[a%W^)\a%, ■ ■ .]p[*%\ ■ ■ ■] (29) 

where we can take 

-logp^')|a« fc ,--.]~ 

9 (c u [f3^) - - log (& * E fc <*® k Mrj) + logT ( ' ij) (^Y^) 

(30) 

and 

-\ogp[aV k \...]~S[aV k ] (31) 

To actually compute the marginalization will require the Metropolis algorithm 
or Gibbs sampler (Geman and Geman 1984). We could simply maximize the 
above, and always replace (3 by the maximum - various approximations will have 
to be numerically experimented with to see what works. 

By constraining the emission of a physical component at each frequency 
to be given according to a sparse representation, we have greatly reduced the 
effective degrees of freedom. This approach exploits non-Gaussianity, as non- 
Gaussian features will typically have long scale-space lifetimes and sparse rep- 
resentations in a wavelet basis. The numerical implementation of the above is, 
needless to say, much more complicated than the linear deterministic solutions 
discussed previously, driving us to stochastic relaxation. The local character 
of the posterior however, gives a conditional probability structure that enables 
various degrees of freedom to be adjusted in parallel. 
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7. Conclusions 

Bayesian inference does not provide any single method for the separation of 
foregrounds and the CMB, but instead is a framework within which methods 
can be formulated, and in the process, explicitly reveal any assumptions made. 
The specific method to be used depends on what information is desired. Vari- 
ous analysis approaches can be understood within a Bayesian framework as the 
"clean sky" limit, or the "high SNR" limit, or the "only one suspected fore- 
ground component in these frequencies" limit, etc. It seems that a unified view 
of analysis would be important, expecially when comparing data returned from 
different experiments. Probably the best way to proceed with actual data is to 
attack with any 'reasonable' Bayesian approach imaginable. With tools such as 
a multi-resolution approach and stochastic relaxation, we can attempt inference 
within the context of physically motivated models that address potential com- 
plexities in the foregrounds. Solutions that are consistent with approaches of 
varying complexity can be considered robust. However, discrepancies in infer- 
ences obtained with various methods provide an opportunity to learn about the 
data itself, and point the way to needed follow up studies and future experiments. 
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